VM with GPU in passthrough mode fails to start on HPE server versions:
ESXi 7.0.3 update 3n fails to start a VM with a GPU connected to it via passthrough.
VM fails to start with an error: Module DevicePowerOn power on failed. Failed to start the virtual machine. Device 6:0.0 is not a passthrough device.
Also you could see hot-plug events during power on VM like below from vmkernel log
2023-07-31T05:51:43.091Z cpu144:15901592)PCIPassthru: 3873: pcipDevInfo(0x43174d483300) allocated for 0000:a9:00.0
2023-07-31T05:51:43.093Z cpu98:2097948)PCIEHP: 1564: 0000:a8:01.0: hotplug slot:0x1: num reads=1 slot status=0x108.
2023-07-31T05:51:43.093Z cpu98:2097948)PCIEHP: 1496: 0000:a8:01.0: hotplug slot:0x1 (0000:a9:00.0) Adapter removed.
2023-07-31T05:51:43.093Z cpu98:2097948)PCIEHP: 380: 0000:a8:01.0: hotplug slot:0x1: Setting PowerIndicator State BLINKING
2023-07-31T05:51:43.094Z cpu98:2097948)PCIEHP: 1048: 0000:a8:01.0: Disabling hotplug slot:0x1
2023-07-31T05:51:45.234Z cpu3:2097947)PCIEHP: 1477: 0000:a8:01.0: hotplug slot:0x1 (0000:a9:00.0) Adapter inserted.
2023-07-31T05:51:45.337Z cpu3:2097947)PCIEHP: 380: 0000:a8:01.0: hotplug slot:0x1: Setting PowerIndicator State BLINKING
2023-07-31T05:51:45.338Z cpu3:2097947)PCIEHP: 982: 0000:a8:01.0: Enabling hotplug slot:0x1
2023-07-31T05:51:45.348Z cpu3:2097947)AMDIommu: 996: IOMMU 0000:a0:00.2: Prepared IOMMU for hotplug device 0000:a9:00.0
2023-07-31T05:51:45.348Z cpu3:2097947)WARNING: PCIEHP: 641: 0000:a8:01.0: hotplug slot: 0x1: Device insertion detected while prior device 0000:a9:00.0 removal is still pending
Known issue with HPE: https://support.hpe.com/hpesc/public/docDisplay?docId=a00121002en_us
This issue can be avoided by disabling the PCIe device hot-plug in the VMware ESXi host installed on the server:
1. On the bare metal ESXi host, enter the command:
2. Reboot the ESXi host.
3. Verify that PCIe device hot-plug is disabled by entering the command:
4. The entry, "FALSE," should be displayed under the Runtime column:
5. After changing this setting, the VMs will function properly when running the GPUs in VMware pass-through mode.
VM power on fails